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(57) The present invention discloses a method and 
system for generating predictive models on evolutive 
data of the type stock market data. In a data processing 
system comprising a processor and data storage means 
containing adatabase of raw transactional data, the sys- 
tem of the invention provides means for creating a table 
of evolutive data from the raw transactional data. The 
table comprises a plurality of objects to be observed 
over a predefined period of time and a set of computa- 
tional variables over which the plurality of objects are 
observed at successive discrete time points. The com- 
putational variables comprise a first variable to be pre- 
dicted and a plurality of predictive variables. Temporary 
predictive models are build over the predefined period 
of time for each object within the table wherein each 
temporary predictive model provides a factor weight as- 
sociated to each predictive variable. Next a cluster of 
the plurality of objects is performed based on the factor 
weight associated to each predictive variable wherein 
each cluster includes a subset of the plurality of objects, 
and for each subset of the plurality of objects, a predic- 
tive model is build using the set of computational varia- 
bles over the predefined period of time. 



Definition of a time window 



-200 



Data pre-processing 



-202 



Guided clustering 


Temporary 
predictive 
models stock 
by stock 


Automatic 
clustering on - 
stock wieghts 

t 



^204 



Building robust predictive 
models 



Price variation estimation 



206 



-208 



Figure 2 



Printed by Jouve, 75001 PARIS (FR) 



1 



EP 1 107 157 A2 



2 



Description 
Technical field 

[0001] The present invention relates to a system and 
method for performing predictive analysis and more par- 
ticularly to such system and method for stock market 
predictive analysis. 

Background art 

[0002] The recent data processing developments for 
stock market and the adoption of continuous marketing 
systems such as the CAC have enlarged the actors 
knowledge on the stock market behavior. From the 90's, 
the financial research have been concentrated on new 
matters such as "high frequency transaction data" and 
"market microstructure" which are characterized mainly 
by a large amount of financial inputs and very short de- 
lay market observations such as every hour or every 
second. 

[0003] These new research axes allowed to analyze 
the process of making prices and the phenomena of 
price variations (volatility) by looking at market structure 
and all active financial insiders, in order to study the link 
between price formation and limit order books. In such 
system, investors make orders by putting limited quan- 
tity into the system with an associated price. Some of 
the demands and offers may not be satisfied if no coun- 
terpart exist to these orders. These unsolved orders 
make up the complete order book at a given time and 
are described in the well-known 'bid and ask' curves. 
The spread between bid and ask curves gives the way 
to compute liquidity and volatility factors on stocks. Li- 
quidity refers to the possibility to exchange at any time 
and at a minimal cost any volume of stocks. Greater the 
spread is, harder will be the conditions to exchange the 
stocks. 

[0004] The general state of the prior art with respect 
to solving the aforementioned problem may be best il- 
lustrated and understood with reference to the several 
following publications which deals with stock orders. In 
Hausman, Lo and Mac Kinlay (1992) titled "An Ordered 
Probit Analysis of Transaction Stock Prices", Journal of 
Financial Economics, vol. 31 , 31 9-330, the authors de- 
scribe a method in which price changes are modeled 
directly using a statistical model known as ordered pro- 
bit. This technique is used most frequently in empirical 
studies of a dependent variable to be explained that take 
on only a finite number of values possessing a natural 
ordering. Such a model is the only specification that can 
easily capture the impact of explanatory variables on 
price changes while also accounting for price discrete- 
ness. In Gourieroux et al. "Etude du carnet d'ordres" 
sept.-oct. 1998, revue Banque & Marches, the relation- 
ship between the price transactions and the bid and ask 
functions is described, and in Engle, R.F.; Russell, J.R. 
"Forecasting the frequency of changes in quoted foreign 



exchange prices with the autoregressive conditional du- 
ration model", Journal of empirical Finance, pp. 
187-212, an autoregressive Conditional Multinomial 
model for discrete valued time series is proposed in the 

5 context of generalized linear models. 

[0005] The typical problems in the prior art solutions 
are generally that the intra-day predictive models are 
first computed stock by stock and then generalized to 
the whole market. This approach suffer a considerable 

10 limitation in that similarities between stocks as well as 
shades within a similarity are not interpreted. 
[0006] Moreover, while the results obtained by such 
methods are of an easy interpretation by the actors, the 
prediction performance may be affected due to the par- 

15 ametric nature of such methods. 

[0007] Some statistical methods based on neural net- 
works offer more robustness such as the generic meth- 
od described in U.S. 5, 461 ,699 from Arbabi and al. in 
which a system for forecasting combines a neural net- 

20 work with a statistical forecast. A neural network having 
an input layer, a hidden layer, and an output layer with 
each layer having one or more nodes is presented. Each 
node in the input layer is connected to each node in the 
hidden layer and each node in the hidden layer is con- 

25 nected to each node in the output layer. Each connec- 
tion between nodes has an associated weight. One 
node in the input layer is connected to a statistical fore- 
cast that is produced by a statistical model. All other 
nodes in the input layer are connected to a different his- 

30 torical datum from the set of historical data. The neural 
network being operative by outputting a forecast, the 
output of the output layer nodes, when presented with 
input data. The weights associated with the connections 
of the neural network are first adjusted by a training de- 

35 vice. The training device applies a plurality of training 
sets to the neural network, each training set consisting 
of historical data, an associated statistical output and a 
desired forecast, with each set of training data the train- 
ing device determines a difference between the forecast 

40 produced by the neural network given the training data 
and the desired forecast, the training device then ad- 
justs the weights of the neural network based on the dif- 
ference. 

[0008] In U.S.5,761,442 from Barr and al. a method 
45 for financial analysis is described. A data processing 
system and method for selecting securities and con- 
structing an investment portfolio is based on a set of ar- 
tificial neural networks which are designed to model and 
track the performance of each security in a given capital 
50 market and output a parameter which is related to the 
expected risk adjusted return for the security. Each ar- 
tificial neural network is trained using a number of fun- 
damental and price and volume history input parame- 
ters about the security and the underlying index. The 
55 system combines the expected return/appreciation po- 
tential data for each security via an optimization process 
to construct an investment portfolio which satisfies pre- 
determined aggregate statistics. The data processing 
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system receives input from the capital market and peri- 
odically evaluates the performance of the investment 
portfolio, rebalancing it whenever necessary to correct 
performance degradation. 

[0009] A problem associated with these non paramet- 
ric processes is to correctly determine which variables 
are significant for the prediction. 
The present invention is directed towards solving these 
aforementioned problems. It is therefore an object of the 
invention to provide a method for generating predictive 
models which is computed over groups of stocks to pro- 
vide a global view of stocks market. 
[0010] This object is achieved by a robust iterative 
method which uses parametric models to provide clear 
and easy interpretable results. 

Summary of the invention 

[001 1 ] The present invention discloses a method and 
system for generating predictive models on evolutive 
data. In accordance with a preferred embodiment of the 
present invention, the method runs in a data processing 
system comprising a processor and data storage means 
containing a database of stock market raw transactional 
data. The method consists in creating a table of evolu- 
tive data from the raw transactional data. The table com- 
prises a plurality of objects to be observed over a pre- 
defined period of time and a set of computational varia- 
bles over which the plurality of objects are observed at 
successive discrete time points. The computational var- 
iables comprise a first variable to be predicted and a plu- 
rality of predictive variables. Temporary predictive mod- 
els are build over the predefined period of time for each 
object within the table wherein each temporary predic- 
tive model provides a factor weight associated to each 
predictive variable. Next a cluster of the plurality of ob- 
jects is performed based on the factor weig ht associated 
to each predictive variable wherein each cluster in- 
cludes a subset of the plurality of objects, and for each 
subset of the plurality of objects, a predictive model is 
build using the set of computational variables over the 
predefined period of time. 

[0012] Preferably, the predictive models are build by 
applying a logistic regression method or by using a gen- 
eralized linear model or an ordered probit model. The 
set of computational variables is preferably determined 
using a stepwise method while the clustering of the plu- 
rality of objects is performed either by using a hierarchi- 
cal clustering or a relational analysis or a k-means meth- 
od. 

Brief description of the drawings 

[0013] Figure 1 is a block diagram of a typical digital 
computer utilized by a preferred embodiment of the in- 
vention. 

[0014] Figure 2 is an overview of the predictive proc- 
ess according to the present invention. 



[0015] Figure 3 is a simplified representation of raw 
transactional data table as used by the process of the 
invention. 

[0016] Figure 4 illustrates a 'bid and ask' curve for the 
5 preferred embodiment. 

[0017] Figures 5-a and 5-b are detailed flow charts of 
the clustering step to be used in the predictive process 
of figure 2. 

10 Detailed description of the invention 

[0018] Preliminary to the description of the preferred 
embodiment, a definition of the most useful terms em- 
ployed in the rest of the description is listed: 

15 

Generalized linear model : the generalized linear 
model is a generalization of classical linear models 
and includes as special cases linear regression, lo- 
gistic regression, ordered probit regression and oth- 
20 er kinds of models. Such models are used to explain 
variation of one variable (that is named variable to 
be predicted or dependent variable) with a subset 
of other variables (that are named predictive varia- 
bles or explanatory variables). The result of this 
25 model is a function giving a weight factors to each 
predictive / explanatory variable. 
Learning phase : to build a statistical model like a 
generalized linear model, two phases are to be con- 
sidered: the learning phase and the testing phase. 
30 The learning phase is the phase where the model 
is built. For this, historical data are used where cas- 
es are recorded with all values known, values for 
the variable to be predicted and values for the pre- 
dictive variables. Statistical model is then built to 
35 minimize the errors between the real value of the 
variable to be predicted and the estimated value by 
the model. 

Testing phase : during the learning phase, the er- 
rors have been minimized and the model error is 
40 underestimate. To estimate the true model error, an 
other independent set of historical data is used. 
During this phase, the model built in learning phase 
is applied to all the records and the result is then 
compared to the real values, giving a more realistic 
45 estimation of the error model. 

Logistic regression : logistic regression is a special 
case of the generalized linear model where the var- 
iable to be predicted is discrete and in general takes 
two values. Ordered probit model: ordered probit 
50 model is a special case of the generalized linear 
model where the variable to be predicted take on 
only a finite number of values possessing a natural 
ordering. 

Robust methods : the statistical models like gener- 
55 alized linear models are designed to be the best 
possible when stringent assumptions apply. How- 
ever, experience and further research have forced 
to recognize that classical techniques can behave 
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badly when the practical situations departs from the 
ideal described by such assumptions. Robust meth- 
ods are broading the effectiveness of statistical 
analyses. Robust methods more often involve iter- 
ation than do classical ones. Thus, instead of find- 
ing a solution in a single step, the method take an 
initial value and successively refine it, bringing it 
closer and closer to the final answer. 
Ask function : the ask function gives the unit price 
that a buyer should pay if he wants to buy immedi- 
ately a given volume. This function is an increasing 
function and is always above the bid function. 
Bid function : the bid function gives the unit price 
that a seller should sell if he wants to sell immedi- 
ately a given volume. This function is a decreasing 
function and is always underneath the ask function. 

[0019] Referring now to the drawings, figure 1 illus- 
trates an environment in which a preferred embodiment 
of the present invention operates. The preferred embod- 
iment of the present invention operates on a computer 
platform 1 05. The computer platform 1 05 includes hard- 
ware units 1 08, including one or more Central Process- 
ing Unit (CPU) 110, a Random Access Memory (RAM) 
109, and an input/output (I/O) interface 111. The com- 
puter platform 105 runs with an operating system 106, 
and may include micro instruction code 1 07. A data base 
management system 100 may be part of the micro in- 
struction code 107 or an application program to be ex- 
ecuted via the operating system. Raw transactional data 
giving status information relative to the particular appli- 
cation may be stored in any kind of local or remote data 
storage 113. Remote data storage may be accessible 
through modems and communication lines (not shown). 
The data may be collected from various sources and 
media such as written information, experts evaluations, 
or in-house historical. Various peripheral units 112 like 
terminals, disks or scanners may be connected to the 
computer platform 105 for inputting the data. The com- 
puter platform 1 05 could be a server terminal connected 
to multiple clients CPU. A user or an actor wishing to 
process the method of the invention would access the 
system through the I/O interface 111. 
[0020] The I/O interface circuit could be as well a re- 
mote terminal with Internet like connection. 
[0021] The prediction process is now described with 
reference to figure 2. The process starts at step 200, 
wherein a time window is defined for the learning phase 
of the prediction model. During this time period, raw 
transactional data are collected to be later processed to 
provide a prediction model for a specific variable. The 
model is further operated by users for predicting the spe- 
cific variable. In the preferred implementation, the time 
window extends over one day, but any other time period 
may be defined, such as days or months depending on 
the particular application. 

[0022] On step 202, the raw transactional data are 
preprocessed to be arranged in a table of evolutive data. 



In the preferred implementation of stock trend analysis, 
the raw transactional data are a collection of information 
relating to the order books as it is now described with 
reference to figure 3. 

5 [0023] Figure 3 is a simplified example of transaction- 
al data organized in a matrix 300 comprising 8 rows of 
stocks (301-1 to 301-8) and by 6 columns 
(302,303,304-1 to 304-4). In the first column 302, a first 
variable 'Stock' identifies the objects (the stocks) con- 

10 tained within the rows. The second column 303 is 
named 'Snapshot' to store the discrete time values (the 
snapshots) at which the stocks are observed (for exam- 
ple each five seconds along the time window, i.e. 24 
hours). The last columns noted from 304-1 to 304-4 de- 

15 scribes the parameters observed for each stock which 
are for example the current price 'P_Cur' on column 
304-1 , the current quantity 'Q_Cur' on column 304-2, the 
buy price 'P_Buy' on column 304-3 and the sell price 
'P_Sel' on column 304-4. 

20 [0024] Let's describe the content of a row 300-1 for a 
first stock 'S1 '. On second column 303, the first obser- 
vation time is fixed, i.e. 9.00 am. The current price of 
stock 'S1' at the first snapshot is stored within cell of 
column 304-1 and equal to 10,3. On column 304-2 the 

25 current quantity of first stock S1 at first snapshot is equal 
to 1000. 

[0025] Column 304-3 gives the proposed price for 
buying this stock at the first snapshot, for example 10,1. 
The last column contains the value of the proposed sell- 
so ing price 10,5 at the first snapshot. 

[0026] On the second row 300-2, the value of the ob- 
served parameters 304-1 to 304-4 are stored for the 
same stock S1 but at the next snapshot, i.e. at 9.05 am. 
The table is filled with each observation and at the end 
35 of the observation time period, the last row 300-8 con- 
tains the values of the measured parameters for the last 
stock S2 at the last snapshot (9.15 am). 
It is obviously to be understood that for simplification of 
the description, only 8 rows and 6 columns are illustrat- 
40 ed, but it is not to be interpreted as a limitation by the 
skill person and in application the number of rows and 
columns may be extended. 

[0027] During the preprocessing step 202, the infor- 
mation contained within table 300 is first exploited in the 

45 form of the well-known 'bid and ask' curves in order to 
compute a set of computational variables associated to 
each object (each stock) at each snapshot. The set of 
variables comprises a first variable for which the predic- 
tion is sought (the variable to be predicted) and a plu- 

50 rality of predictive variables which reveals some inter- 
esting characteristics of the 'bid and ask' curves. The 
plurality of predictive variables are exploited during the 
process of the present invention as it will be detailed 
later. In the preferred implementation, the variable to be 

55 predicted is the stock price variation between two con- 
secutive snapshots (t, t+1). 

[0028] Figure 4 is a simplified representation of 'bid 
and ask' curves at a snapshot 't'. The abscissa of curve 
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400 details the quantities of stocks proposed by actors 
for buying (Q_buy) or for selling (Q_Sel) the stocks. The 
ordinate of curve 400 details the prices of stocks as pro- 
posed by actors for buying (P_buy) or for selling (P_Sel) 
the stocks. The lower curve illustrates the 'bid' curve 
while the higher curve illustrates the 'ask' curve. From 
curves 400, two groups of predictive variables are com- 
puted, a static group and a dynamic group. Static group 
includes predictive variables well-known in the field of 
stock exchange such as the spread, the depth, the me- 
dian price, the buying slope and the selling slope which 
are described on a 'bid and ask' curves at a particular 
snapshot 't'. Dynamic group includes predictive varia- 
bles which describe variations between two 'bid and ask' 
curves at two different snapshots, for example between 
snapshot 't' and snapshot 't-1' or between snapshot T 
and snapshot 't-2'. One preferred dynamic variable is 
the price and the quantity of a given stock at a given 
snapshot 't' as compared to a 'bid and ask' curve of a 
previous snapshot 't-1 '. 

Other dynamic predictive variables may be computed 
such as for example the price variation between snap- 
shot 't-1 ' and snapshot T. 

In the preferred embodiment, the initial collected param- 
eters 304-1 to 304-4 which relate to the prices and the 
quantities of stocks proposed by actors either for buying 
or for selling are normalized before the static and the 
dynamic predictive variables are computed. 
[0029] The following equations give the formulae to 
perform such standardisation. At each snapshot, each 
proposed price (buying or selling) is divided by the me- 
dian price (Med-price) computed at the same snapshot 
as follows: 

Med-price (t) = [P_Sel1 (t) + P_Buy1 (t)] / 2 

[0030] The standardized selling price value is there- 
fore calculated as follows: 

Std-P_Sel1 (t) = P_Sel1 (t) / Med-price (t), 

and 

the standardized buying price value is calculated as fol- 
lows: 

Std-P_Buy1 (t) = P_Buy1 (t) / Med-price (t) 

[0031] Standardized values for prices and quantities 
at the different snapshots are respectively computed 
with corresponding equations. 

[0032] Finally, the preprocessing step provides a set 
of computational variables including a first variable to 
be predicted and a plurality of static and dynamic pre- 
dictive variables for each stock and each snapshot. 
[0033] In alternate embodiment, the rows of the trans- 



actional data table are preferably filtered to select a 
number of rows having highest (positive or negative) 
values of the variable to be predicted. 
[0034] On next step 204, a guided clustering of the 
5 data transformed by the preprocessing is performed in 
two operations. 

In a first step, a temporary predictive model is computed 
for each stock over the set of snapshots. Each tempo- 
rary predictive model provides a ponderation value or 
10 factor weight for each predictive variable. In a second 
step, an automatic clustering of the stocks is performed 
based on the ponderation value associated to each 
stock. 

[0035] Figure 5-a and 5-b illustrates in more details 

15 the guided clustering operation 204. On step 500, a well- 
known stepwise logistic regression is performed on 
each stock over the set of snapshots. The result pro- 
vides for each stock an intermediate temporary predic- 
tive model and a subset of the most discriminant predic- 

20 tive variables. In an alternate implementation, general- 
ized linear models or ordered probit models may be 
used to provide the intermediate temporary predictive 
model in well-known fashion. Step 500 is run for all the 
stocks and for each stock the associated subset of the 

25 most discriminant predictive variables is stored within a 
memory area of the database system management 1 00. 
[0036] On step 502, a new subset is selected among 
the whole subsets of the most discriminant predictive 
variables. In the preferred embodiment, the new subset 

30 includes the most frequently stored discriminant predic- 
tive variables. On next step 504, a logistic regression is 
performed in a well known manner on each stock with 
the new subset of the most frequently stored discrimi- 
nant predictive variables. The result provides for each 

35 stock a temporary predictive model giving a ponderation 
value for each predictive variable. 
[0037] On step 506 of figure 5-b , the computed 
weights are temporary stored within a memory area of 
the database system management 100. 

40 On next step 508, the stocks are clustered into several 
groups denoted (510,512,514,516) based on the factor 
weights. In application, the clusters of stocks may be 
displayed to the users to offer a global view of the stock 
market. 

45 [0038] Referring back to figure 2, step 206 consists in 
building a robust predictive model for each group of 
stocks (clusters 51 0 to 51 6) previously computed in step 
204. For each cluster, a stepwise logistic regression is 
performed over a subset of the transformed data previ- 

50 ously issued from step 202 and containing all stocks of 
the considered group with the corresponding snap- 
shots. The result of step 206 provides a robust model 
for each group of stocks. 

[0039] Finally on step 208, the robust predictive mod- 
55 els are stored in local or remote databases to be run by 
a user to estimate the trend of the variable to be predict- 
ed. The process may loop back from step 206 to step 
200 to update the robust predictive models if new raw 
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transactional data are presented to the database sys- 
tem management 100. 

[0040] Although the present invention has been fully 
described above with reference to specific embodi- 
ments, other alternative embodiments will be apparent 
to those of ordinary skill in the art. Therefore, the above 
description should not be taken as limiting the scope of 
the present invention which is defined by the appended 
claims. 



Claims 

1. In a data processing system comprising a proces- 
sor and data storage means containing a database 
of raw transactional data, a method for generating 
predictive models on evolutive data, the method 
comprising the steps of: 

a) creating a table of evolutive data from the 
raw transactional data, the table comprising a 
plurality of objects to be observed over a pre- 
defined period of time and a set of computation- 
al variables over which the plurality of objects 
are observed at successive discrete time 
points, said computational variables compris- 
ing a first variable to be predicted and a plurality 
of predictive variables; 

b) for each object within the table building a 
temporary predictive model over the predefined 
period of time, each temporary predictive mod- 
el providing a factor weight associated to each 
predictive variable; 

c) clustering the plurality of objects based on 
the factor weight associated to each predictive 
variable, each cluster including a subset of the 
plurality of objects; and 

d) for each subset of the plurality of objects, 
building a predictive model over the predefined 
period of time using the set of computational 
variables. 

2. The method of claim 1 wherein the step of creating 
a table further comprising the steps of: 

a1) extracting from the database a plurality of 
objects to be observed over a predefined period 
of time; 

a2) generating a set of computational variables 
from the raw transactional data comprising a 
first variable to be predicted and a plurality of 
predictive variables; and 
a3) computing at successive discrete time 
points the value of each variable for the plurality 
of objects. 

3. The method of claim 1 or 2 wherein the plurality of 
predictive variables comprises static variables and 



dynamic variables. 

4. The method of claim 1 wherein step b) of building 
temporary predictive models comprises the step of 

5 applying a method from the group of logistic regres- 

sion or generalized linear model or ordered probit 
model. 

5. The method of claim 1 wherein step d) of building 
10 predictive models comprises the step of applying a 

method from the group of logistic regression or gen- 
eralized linear model or ordered probit model. 

6. The method of claim 1 wherein the set of computa- 
15 tional variables is determined using a stepwise 

method. 

7. The method of claim 1 wherein step c) of clustering 
the plurality of objects is performed using a hierar- 

20 chical clustering. 

8. The method of claim 1 wherein step c) of clustering 
the plurality of objects is performed using relational 
analysis. 

25 

9. The method of claim 1 wherein step c) of clustering 
the plurality of objects is performed using ak-means 
method. 

30 10. The method of claim 1 wherein the predictive mod- 
els are build using neural networks. 

11. The method of claim 1 further comprising the step 
of storing the predictive models in a database for 

35 user access. 

1 2. The method of claim 1 1 wherein the predictive mod- 
els are used for intra day analysis of stock market. 

40 13. in a data processing system comprising a proces- 
sor and data storage means containing a database 
of raw transactional data, a system for generating 
predictive models on evolutive data comprising: 

45 a) means for creating a table of evolutive data 

from the raw transactional data, the table com- 
prising a plurality of objects to be observed over 
a predefined period of time and a set of com- 
putational variables over which the plurality of 
50 objects are observed at successive discrete 

time points, said computational variables com- 
prising a first variable to be predicted and a plu- 
rality of predictive variables; 
b) means for building a temporary predictive 
55 model over the predefined period of time for 

each object within the table, each temporary 
predictive model providing a factor weight as- 
sociated to each predictive variable; 



20 

8. 

25 

9. 
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c) means for clustering the plurality of objects 
based on the factor weight associated to each 
predictive variable, each cluster including a 
subset of the plurality of objects; and 

d) means using the set of computational varia- 5 
bles for building a predictive model over the 
predefined period of time for each subset of the 
plurality of objects. 

14. The system of claim 13 wherein the means for ere- 10 
ating a table further comprising: 



for storing the predictive models in a database for 
user access. 

24. The system of claim 23 wherein the predictive mod- 
els are used for intra day analysis of stock market. 

25. A computer program product stored in memory ex- 
ecutable by a processor for generating predictive 
models on evolutive data, in a database stored in a 
data processing system and containing raw trans- 
actional data, the product comprising: 



a1 ) means for extracting from the database a 
plurality of objects to be observed over a pre- 
defined period of time; 

a2) means for generating a set of computation- 
al variables from the raw transactional data 
comprising a first variable to be predicted and 
a plurality of predictive variables; and 
a3) means for computing at successive dis- 
crete time points the value of each variable for 
the plurality of objects. 

15. The system of claim 13 or 14 wherein the plurality 
of predictive variables comprises static variables 
and dynamic variables. 

16. The system of claim 13 wherein said means for 
building temporary predictive models further com- 
prises means for applying either a logistic regres- 
sion or a generalized linear model or an ordered 
probit model. 

17. The system of claim 13 wherein said means for 
building predictive models further comprises means 
for applying either a logistic regression or a gener- 
alized linear model or an ordered probit model. 



a) means for creating a table of evolutive data 
from the raw transactional data, the table com- 

15 prising a plurality of objects to be observed over 

a predefined period of time and a set of com- 
putational variables over which the plurality of 
objects are observed at successive discrete 
time points, said computational variables com- 

20 prising a first variable to be predicted and a plu- 

rality of predictive variables; 

b) means for building a temporary predictive 
model over the predefined period of time for 
each object within the table, each temporary 

25 predictive model providing a factor weight as- 

sociated to each predictive variable; 

c) means for clustering the plurality of objects 
based on the factor weight associated to each 
predictive variable, each cluster including a 

30 subset of the plurality of objects; and 

d) means using the set of computational varia- 
bles for building a predictive model over the 
predefined period of time for each subset of the 
plurality of objects. 



18. The system of claim 13 wherein the set of compu- 
tational variables is determined using a stepwise 40 
method. 



19. The system of claim 13 wherein stepc) of clustering 
the plurality of objects is performed using a hierar- 
chical clustering. 45 

20. The system of claim 1 3 wherein step c) of clustering 
the plurality of objects is performed using relational 
analysis. 

50 

21 . The system of claim 1 3 wherein step c) of clustering 
the plurality of objects is performed using a k-means 
method. 



22. The system of claim 13 wherein the predictive mod- 55 
els are build using neural networks. 

23. The system of claim 13 further comprising means 
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